Robot Learning with State-Dependent Exploration

نویسندگان

  • Thomas Rückstieß
  • Martin Felder
  • Frank Sehnke
  • Jürgen Schmidhuber
چکیده

Policy gradient algorithms are among the few learning methods successfully applied to demanding real-world problems including those found in the field of robotics. While Likelihood Ratio (LR) methods are typically used to estimate the gradient, they suffer from high variance due to random exploration at each timestep during the rollout. We therefore evaluate several policy gradient methods with state-dependent exploration (SDE), a recently introduced alternative to random exploration, which deterministically returns the same action for a given state during one episode. We apply SDE to a simulated robotics task with realistically modelled physics, and compare it to random exploration within several different learning schemes. Our experiments show that SDE outperforms traditional random exploration in almost every case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring parameter space in reinforcement learning

This paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-ex...

متن کامل

A Q-learning Based Continuous Tuning of Fuzzy Wall Tracking

A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...

متن کامل

Visual Tracking using Learning Histogram of Oriented Gradients by SVM on Mobile Robot

The intelligence of a mobile robot is highly dependent on its vision. The main objective of an intelligent mobile robot is in its ability to the online image processing, object detection, and especially visual tracking which is a complex task in stochastic environments. Tracking algorithms suffer from sequence challenges such as illumination variation, occlusion, and background clutter, so an a...

متن کامل

Robot Learning: Exploration and Continuous Domains

The goal of this workshop was to discuss two major issues: efficient exploration of a learner's state space, and learning in continuous domains. The common themes that emerged in presentations and in discussion were the importance of choosing one's domain assumptions carefully, mixing controllers/strategies, avoidance of catastrophic failure, new approaches with difficulties with reinforcement ...

متن کامل

Upper Confidence Weighted Learning for Efficient Exploration in Multiclass Prediction with Binary Feedback

We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback. UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (SCW) online learning scheme. UCWL achieves state of the art performance (especially on noisy and nonseparable data) with low computational costs. Estimated confidence inter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008